{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true,
    "pycharm": {
     "is_executing": false
    }
   },
   "source": [
    "# Softmax,  log_softmax, NLLLoss and CrossEntropy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What is Sofmax\n",
    "\n",
    "**函数Softmax(x)**: 输入一个实数向量并返回一个概率分布。定义 x 是一个实数的向量(正数或负数都可以)。 然后, 第i个 Softmax(x) 的计算方式为：\n",
    "$$\\frac{{\\exp ({x_i})}}{{\\sum\\nolimits_j {\\exp ({x_j})} }}$$\n",
    "\n",
    "输出是一个概率分布: 每个元素都是非负的, 并且所有元素的总和都是1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<torch._C.Generator at 0x1ab80351250>"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import torch\n",
    "import torch.nn.functional as F\n",
    "\n",
    "torch.manual_seed(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**在图片分类问题中，输入m张图片，输出一个m\\*N的Tensor，其中N是分类类别总数。比如输入2张图片，分三类，最后的输出是一个2*3的Tensor，举个例子：**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[ 1.5410, -0.2934, -2.1788],\n",
      "        [ 0.5684, -1.0845, -1.3986]])\n"
     ]
    }
   ],
   "source": [
    "output = torch.randn(2, 3)\n",
    "print(output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "第1,2行分别是第1,2张图片的结果，假设第123列分别是猫、狗和猪的分类得分。\n",
    "\n",
    "可以看出模型认为两张都更可能是猫。\n",
    "然后对每一行使用Softmax，这样可以得到每张图片的概率分布。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[0.8446, 0.1349, 0.0205],\n",
      "        [0.7511, 0.1438, 0.1051]])\n"
     ]
    }
   ],
   "source": [
    "print(F.softmax(output, dim=1))\n",
    "# 这里dim的意思是计算Softmax的维度，这里设置dim=1，可以看到每一行的加和为1。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What is log_softmax\n",
    "\n",
    "这个很好理解，其实就是对softmax处理之后的结果执行一次对数运算。\n",
    "\n",
    "可以理解为 log(softmax(output))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[-0.1689, -2.0033, -3.8886],\n",
      "        [-0.2862, -1.9392, -2.2532]])\n",
      "tensor([[-0.1689, -2.0033, -3.8886],\n",
      "        [-0.2862, -1.9392, -2.2532]])\n"
     ]
    }
   ],
   "source": [
    "print(F.log_softmax(output, dim=1))\n",
    "print(torch.log(F.softmax(output, dim=1)))\n",
    "# 输出结果是一致的"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What is NLLLoss？\n",
    "\n",
    "该函数的全程是negative log likelihood loss. 若$x_i=[q_1, q_2, ..., q_N]$ 为神经网络对第i个样本的输出值，$y_i$为真实标签。则：\n",
    "$$f(x_i,y_i)=-q_{y_i}$$\n",
    "\n",
    "输入：log_softmax(output), target"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor(1.2000)\n"
     ]
    }
   ],
   "source": [
    "print(F.nll_loss(torch.tensor([[-1.2, -2, -3]]), torch.tensor([0])))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 通常我们结合 log_softmax 和 nll_loss一起用"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Output is [1.2, 2, 3]. If the target is 0, loss is: tensor(2.2273)\n",
      "Output is [1.2, 2, 3]. If the target is 1, loss is: tensor(1.4273)\n",
      "Output is [1.2, 2, 3]. If the target is 2, loss is: tensor(0.4273)\n"
     ]
    }
   ],
   "source": [
    "output = torch.tensor([[1.2, 2, 3]])\n",
    "\n",
    "target = torch.tensor([0])\n",
    "log_sm_output = F.log_softmax(output, dim=1)\n",
    "print('Output is [1.2, 2, 3]. If the target is 0, loss is:', F.nll_loss(log_sm_output, target))\n",
    "\n",
    "target = torch.tensor([1])\n",
    "log_sm_output = F.log_softmax(output, dim=1)\n",
    "print('Output is [1.2, 2, 3]. If the target is 1, loss is:', F.nll_loss(log_sm_output, target))\n",
    "\n",
    "target = torch.tensor([2])\n",
    "log_sm_output = F.log_softmax(output, dim=1)\n",
    "print('Output is [1.2, 2, 3]. If the target is 2, loss is:', F.nll_loss(log_sm_output, target))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  在分类问题中，CrossEntropy等价于log_softmax 结合 nll_loss"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$N$分类问题，对于一个特定的样本，已知其真实标签，CrossEntropy的计算公式为：\n",
    "\n",
    "$$cross\\_entropy=-\\sum_{k=1}^{N}\\left(p_{k} * \\log q_{k}\\right)$$\n",
    "\n",
    "其中p表示真实值，在这个公式中是one-hot形式；**q是经过softmax计算后的结果， $q_k$为神经网络认为该样本为第$k$类的概率。**\n",
    "\n",
    "仔细观察可以知道，因为p的元素不是0就是1，而且又是乘法，所以很自然地我们如果知道1所对应的index，那么就不用做其他无意义的运算了。所以在pytorch代码中target不是以one-hot形式表示的，而是直接用scalar表示。若该样本的真实标签为$y$,则交叉熵的公式可变形为：\n",
    "\n",
    "$$cross\\_entropy=-\\sum_{k=1}^{N}\\left(p_{k} * \\log q_{k}\\right)=-log \\, q_{y}$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor(2.2273)\n"
     ]
    }
   ],
   "source": [
    "output = torch.tensor([[1.2, 2, 3]])\n",
    "target = torch.tensor([0])\n",
    "\n",
    "log_sm_output = F.log_softmax(output, dim=1)\n",
    "nll_loss_of_log_sm_output = F.nll_loss(log_sm_output, target)\n",
    "print(nll_loss_of_log_sm_output)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor(2.2273)\n"
     ]
    }
   ],
   "source": [
    "output = torch.tensor([[1.2, 2, 3]])\n",
    "target = torch.tensor([0])\n",
    "\n",
    "ce_loss = F.cross_entropy(output, target)\n",
    "print(ce_loss)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# More about softmax"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0.02590865 0.11611453 0.85797681]\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "def softmax(x):\n",
    "    x_exp = np.exp(x)\n",
    "    return x_exp / np.sum(x_exp)\n",
    "\n",
    "output = np.array([0.1, 1.6, 3.6])\n",
    "print(softmax(output))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0.22916797 0.3093444  0.46148762]\n"
     ]
    }
   ],
   "source": [
    "def softmax_t(x, t):\n",
    "    x_exp = np.exp(x / t)\n",
    "    return x_exp / np.sum(x_exp)\n",
    "\n",
    "output = np.array([0.1, 1.6, 3.6])\n",
    "print(softmax_t(output, 5))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0.33327778 0.33332777 0.33339445]\n"
     ]
    }
   ],
   "source": [
    "print(softmax_t(output, 10000))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
